Filter Bank Feature Extraction for Gaussian Mixture Model Speaker Recognition
نویسندگان
چکیده
Speaker Recognition is the task of identifying an individual from their voice. Typically this task is performed in two consecutive stages: feature extraction and classification. Using a Gaussian Mixture Model (GMM) classifier different filter-bank configurations were compared as feature extraction techniques for speaker recognition. The filter-banks were also compared to the popular Mel-Frequency Cepstral Coefficients (MFCC) with respect to speaker recognition performance on the CSLU Speaker Recognition Corpus. The empirical results show that a uniform filter-bank outperforms both the mel-scale filter bank and the MFCC as a feature extraction technique. These results challenge the notion that the mel-scale is an appropriate division of the spectrum for speaker recognition.
منابع مشابه
Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks
A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter ban...
متن کاملLanguage and Text-Independent Speaker Identification System Using GMM
This paper motivates the use of Dynamic Mel-Frequency Cepstral Coefficient (DMFCC) feature and combination of DMFCC and MFCC features for robust language and text-independent speaker identification. MFCC feature, modeled on the human auditory system has been the widely used feature for speaker recognition because of its less vulnerability to noise perturbation and little session variability. Bu...
متن کاملThe Wavelet and Fourier Transforms in Feature Extraction for Text-Dependent, Filterbank-Based Speaker Recognition
An important step in speaker recognition is extracting features from raw speech that captures the unique characteristics of each speaker. The most widely used method of obtaining these features is the filterbank-based Mel Frequency Cepstral Coefficients (MFCC) approach. Typically, an important step in the process is the employment of the discrete Fourier transform (DFT) to compute the spectrum ...
متن کاملA Wavelet Packet and Mel-Frequency Cepstral Coefficients-Based Feature Extraction Method for Speaker Identification
One of the most widely used approaches for feature extraction in speaker recognition is the filter bank-based Mel Frequency Cepstral Coefficients (MFCC) approach. The main goal of feature extraction in this context is to extract features from raw speech that captures the unique characteristics of a particular individual. During the feature extraction process, the discrete Fourier transform (DFT...
متن کاملDesign, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition
Standard Mel frequency cepstrum coefficient (MFCC) computation technique utilizes discrete cosine transform (DCT) for decorrelating log energies of filter bank output. The use of DCT is reasonable here as the covariance matrix of Mel filter bank log energy (MFLE) can be compared with that of highly correlated Markov-I process. This full-band based MFCC computation technique where each of the fi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002